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Abstract.  The  function  cj(A)  =  is  introduced  as  a  measure  of 

n  det  (A) » 

deviation  of  a  positive-definite  matrix  from  the  identity.  This  appears  to 
be  a  more  uniform  measure  than  the  standard  £2  condition  number  since 
it  takes  all  the  eigenvalues  of  A  into  account.  Optimal  quasi-Newton 
updates  are  given  with  respect  to  various  applications  of  this  measure. 
This  yields  the  inverse-sized  BFGS  and  sized  DFP  updates  suggested  by 
Oren  and  Luenberger,  and  it  gives  rise  to  a  new  one-parameter  class  of 
updates  based  on  these  two  updates  just  as  the  Broyden  class  is  based  on 
the  BFGS  and  DFP  updates.  Also  considered  are  alternatives  to  sizing 
after  the  first  step.  This  leads  to  some  interesting  weighted  Frobenius 
norm  problems  for  weak  forms  of  the  secant  condition  and  a  particular 
Fletcher  dual  pair  in  the  Broyden  class. 
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1  Introduction 


We  consider  quasi-Newton  methods  for  the  unconstrained  optimization  problem 

where  /  is  twice  continuously  differentiable.  The  methods  use  a  local  quadratic  model  of  the 
form 

f(xc  +  s)  «  f(xc)  +  gls  +  -s^cs  (2) 

where  xc  is  the  current  approximation  to  a  minimizer  x*,  Bc  is  the  current  approximation  to  the 
true  Hessian  G  at  xc,  and  gc  is  the  gradient  at  xc.  We  will  use  the  notation  that  2?_1  =  H .  Our 
particular  interest  is  in  the  secant-type  methods  based  upon  approximating  Newton’s  method 
by  accumulating  Hessian  approximations  using  gradient  differences.  These  methods  have  the 
property  that  the  next  Hessian  approximation  B  =  B+  satisfies  the  secant  condition 

Bs  =  y  =  g+  -  gc  or  Hy  =  s  =  x+  -  xc. 

The  best  known  class  of  such  approximations  is  the  Broyden  family  of  updates.  The  best  known 
members  of  this  family  are  the  BFGS  and  the  DFP  methods.  We  will  not  restrict  ourselves  to 
the  Broyden  family. 

Using  the  standard  1%  measure  of  conditioning  k,  optimal  updates  in  the  Broyden  family 
of  rank-two  updates  have  been  found  by  Davidon  [1].  Specifically,  Davidon  chooses  a  member 
B+((f))  of  the  Broyden  class  that  minimizes  k (HcB+(<f>)).  Although  it  is  common  to  call  these 
methods  optimally  conditioned ,  we  think  it  is  more  salient,  as  well  as  more  in  the  mainstream  of 
the  subject,  to  view  n(HcB+(<j)))  as  a  measure  of  the  change  made  in  the  Hessian  approximation 
by  the  update.  This  is  because  we  like  to  view  k  as  a  measure  of  deviation  of  a  matrix  from  a 
multiple  of  the  identity. 

In  this  paper  we  will  use  a  different  measure  u>  of  the  deviation  of  a  matrix  from  the  identity. 
To  us,  this  measure  seems  more  relevant  to  the  updating  context  than  k.  Furthermore,  it  is  very 
similar  to  the  measure  used  in  the  proofs  in  [2],  For  some  interesting  results  on  that  specific 
measure,  see  [3].  We  give  some  properties  of  u  and  relate  it  to  k  in  Section  2.  In  Section  3,  we 
find  least-change  secant  updates  from  the  Broyden  class  using  u>(HcB+)  and  u(BcH+).  These 
results  are  interesting,  but  they  mainly  serve  as  lemmas  for  our  main  results  in  Section  5  where 
we  give  very  interesting  connections  between  updates  that  minimize  the  measures  u>(HcB+)  and 
u(BcH+ )  and  the  so-called  Oren-Luenberger  [5]  scaling. 

In  order  to  interpret  the  results  of  Section  3,  and  to  prepare  for  the  results  of  Section  5,  we 
give  some  results  on  Oren-Luenberger  scaling  in  Section  4.  Also,  since  Oren-Luenberger  scaling, 
which  we  prefer  to  call  sizing,  is  generally  regarded  as  useful  only  in  the  first  step  of  an  iteration 
[6],  we  look  for  an  alternative  for  subsequent  iterations.  This  leads  us  to  some  interesting  weighted 
Frobenius  norm  problems  for  weak  forms  of  the  secant  condition 

T  T  T  T 

s  Bs  —  s  y  or  y  Hy  =  y  s. 
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These  problems  are  solved  with  some  surprises  in  Section  4  by  rank-one  updates,  which  we  call 
weak  secant  updates. 

In  Section  5,  we  bring  together  sizing,  and  weak  and  strong  least-change  secant  updating  in  u 
and  in  the  traditional  weighted  Frobenius  norms  associated  with  the  DFP  and  BFGS  methods. 
This  leads  to  even  stronger  connections  between  sizing,  w-least-change  secant  updates,  and  the 
DFP  and  BFGS  methods.  Weak  updating  followed  by  weighted  Frobenius  updating  leads  to  a 
pair  of  Fletcher-dual  members  of  the  Broyden  class  which  we  have  not  seen  identified  before,  but 
which  resemble  the  Hoshino  update  [7]. 

In  Section  6,  we  consider  the  special  two-dimensional  example  of  Powell  in  [8]  used  there  to 
illustrate  that  the  BFGS  behaves  better  than  the  DFP.  We  show  that  the  w-least-change  secant 
methods  for  this  special  case  is  a  sized  DFP,  or  equivalently  an  inverse-sized  BFGS,  and  we 
give  the  corresponding  numerical  results  for  the  sized  DFP.  These  results  are  better  than  for  the 
BFGS.  We  also  include  some  numerical  tests  for  other  problems. 
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2  Preliminaries 


Let  A  be  an  n  x  n  symmetric  positive  definite  matrix  (denoted  s.p.d.),  with  eigenvalues 


Ai  >  . . .  >  An  >  0, 


(3) 


and  corresponding  eigenvectors  u*, . . . ,  un.  Then  the  usual  £2  condition  number  of  A  is  given  by 


«(A)  =  Ai/An. 


(4) 


The  condition  number  k(A)  is  used  in  perturbation  analysis  for  matrix  inversion,  e.g.  for  the 
systems  of  linear  equations  Ax  =  b  and  Ax  —  b ,  we  obtain  bounds  on  the  relative  differences 


1  I1&-5I1 

K(A)  ii&ii 


<  /c(A) 


\\b~b\\ 

IHI 


(5) 


see  e.g.  [9].  This  condition  number  acts  as  an  upper  bound  on  the  amplification  factor  of  the 
relative  change  in  the  right  hand  side  in  bounding  the  relative  change  in  the  solution 

It  has  often  been  noted  that  k  depends  only  on  the  largest  and  smallest  eigenvalues  We 
propose  using  the  following  measure  which  depends  more  uniformly  on  all  the  eigenvalues. 


u(A)  = 


trace(A)/n 

det(A)n 


(6) 


Just  as  the  usual  condition  number  k(A)  is  a  measure  of  how  close  A  is  to  the  pencil  ai ,  where  I 
is  the  identity  matrix  and  a  >  0,  the  function  u;(A)  similarly  measures  the  ‘distance’  of  A  from 
ai.  This  measure  takes  all  the  eigenvalues  of  A  into  account  and  so  is  a  more  uniform  or  average 
indicator  of  the  distance  of  A. 

Moreover,  w(A)  can  be  calculated  in  terms  of  the  actual  data  of  A  rather  than  its  spectrum 
and  can  be  more  easily  differentiated  and  manipulated. 

We  now  give  some  useful  properties  of  u  and  address  some  related  issues.  Here  we  restrict 
ourselves  to  the  s.p.d.  matrices  and  use  the  Lowner  order,  i.e.  A  >  B  means  A  —  B  is  positive 
semidefinite.  See  [10,  pg.  475].  Remember  that  a  function  is  pseudoconvex  means: 


See  [11]. 


( V  ~  xY'Vf(x)  >  0  /(jf)  >  f(x). 


(7) 


Proposition  2.1  The  measure  w(A)  satisfies 
(i)  1  <  w(A)  <  k(A)  <  <  4w"(A), 

with  equality  in  the  first  and  second  inequality  if  and  only  if  A  is  a  multiple  of  the  identity 
and  equality  in  the  last  inequality  if  and  only  if 


»  _  .  Ai  +  A„ 

A2  - - An_i  =  - - - ; 


(8) 
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(ii)  w(aA)  =  to(A),  for  all  a  >  0; 

(Hi)  if  n  =  2,  lo(A)  is  isotonic  with  k(A). 

(iv)  The  measure  to  is  pseudoconvex  on  the  set  of  s.p.d.  matrices,  and  thus  any  stationary  point 
is  a  global  minimizer  of  u. 


(v)  Let  V  be  a  full  rank  mxn  matrix,  n  <  m.  Then  the  optimal  column  scaling  that  minimizes 
the  measure  to,  i.e. 

min  to((VD)\VD)),  (9) 

over  D  positive,  diagonal,  is  given  by 


Dii~  iivifir**- 1 . . 


(10) 


where  Va  is  the  i-th  column  ofV. 


Proof.  That  1  <  w(A)  follows  from  the  arithmetic-geometric  mean  inequality,  while  to(A)  < 
k(A)  <  follows  from  the  definitions.  The  equality  conditions  also  follow  directly  from 

the  definitions.  To  prove  the  last  inequality  in  (i),  we  fix  Ai  and  An  and  thus  also  k(A).  We 
now  minimize  to(A)  by  differentiating.  (Note  (iv)  in  the  Proposition.)  This  yields  the  equality 
conditions  (8).  Substitution  shows  that 


min  wn(A)  = 


(*(>*)+ l)2 

k(A) 


(11) 


MA)=Sil  =  «(/)U«WT  (12) 

«(A)  2 

The  derivative  of  to{A)  with  respect  to  k(A)  can  now  be  seen  to  be  positive  since  k(A)  >  1. 

The  function  det(A)  is  log  concave  and  strictly  increasing  and  the  function  det(A)n  is  concave 
and  increasing.  See  e.g.  [10].  The  trace  function  is  linear  and  so  convex.  Thus  u  is  the  quotient 
of  a  convex  function  by  a  concave  function  and  so  it  is  pseudoconvex.  Pseudoconvex  functions 
have  the  property  that  every  stationary  point  is  a  global  minimizer.  See  [11]. 

To  prove  (v),  let  V  be  given.  Then  the  arithmetic-geometric  mean  inequality  yields 


u>{{VD)\VD))  = 


> 


trDtVtV  D/n 

(. ietD'V'VD )" 
trV'VD 2 /n 

(detD1  D)ri  (detVtV)n 


x 

n 


with  equality  if  and  only  if  ||Vi||2JD”  =  constant ,  i  =  1, . . . ,  n.  □ 


(13) 
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Property  (i)  above  shows  that  u>(A)  is  a  valid  condition  number,  e.g.  we  can  replace  k(,4)  by 
Aun(A)  in  (5).  Property  (v)  shows  that  the  measure  u ;  predicts  the  best  column  scaling.  Note 
that  this  is  the  best  scaling  used  in  practice,  see  e.g.  [9],  and  it  is  not  the  one  found  by  minimizing 
the  measure  k.  Moreover,  the  proof  of  (v)  is  particularly  simple. 

We  use  u>  as  a  measure  of  ‘best’  in  determining  some  quasi-Newton  updates.  This  leads  to 
minimizing  this  measure  subject  to  constraints.  The  following  lemma  shows,  under  very  mild 
assumptions,  that  we  do  not  have  to  worry  about  maintaining  positive  definiteness  in  our  updates 
since  they  will  solve  problems  like  the  one  posed  here. 

Lemma  2.1  Given  the  s.p.d.  matrix  C,  consider  the  quantity 


p*  =  inf  bu(BC) 

subject  to  vtBv  =  7 

B  s.p.d.,  B  E  fl , 

where  v  ^  0  and  7  >  0  are  given,  and  Q  is  a  closed  set  of  symmetric  matrices.  Assume  that  a 
feasible  B  exists.  Then  the  finite  value  p*  is  attained  at  some  B*  s.p.d. 


Proof.  First  note  that 


1  <  p*  <  ot, 


(14) 


where  ^(p^p-C)  =  a  <  00,  since  pjp-f  satisfies  the  equality  constraint.  Choose  {14}  such  that 
each  Bk  is  feasible  and 

lim  u >(BkC)  =  p* .  (15) 


If  either  Xi(Bk)  is  unbounded  above,  or  A n{Bk)  is  not  bounded  away  from  zero,  then  sup  n(Bk)  = 
sup u>(Bk)  =  00,  by  Prop.  2.1  (i).  But  this  implies  oj(BkC)  — ►  00,  a  contradiction.  Thus  we  can 
assume  that  Bk  — »  B  for  some  B  £  ft  which  is  s.p.d.  □ 

In  the  sequel  we  consider  the  space  of  s.p.d.  matrices  as  a  subspace  of  the  n  x  n  matrices 
with  the  inner  product 

( A ,  B)  =  trace(AB)  . 


The  gradients  of  functionals  restricted  to  this  subspace  are  symmetric  matrices. 
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3  ^-Optimal  Rank-2  Updates 


We  now  consider  the  Broyden  family  of  updates 

1  „  ,  „  1 


B<t,  =  Br  — 


Bcss  Bc  +  —yy +  (1  -  <j>)s  Bcs  ww  , 
sLtirs  yls 


c  *5, 


(16) 


where  s  =  x+  —  xc  is  the  step  taken,  y  =  g+  —  gc,  the  current  approximation  Bc  to  the  Hessian 
is  s.p.d.,  and 

1  1 

(17) 


If  <f>  =  1  we  get  the  BFGS  update  and  0  —  0  yields  the  DFP  update.  The  updates  for  0  6  [0, 1] 
are  called  the  convex  class.  This  is  not  the  most  common  parameterization,  but  it  is  well  known 
and  it  allows  us  to  use  results  directly  from  [12]  without  reproving  them  here  and  uselessly 
lengthening  the  paper. 

If  we  form  the  Fletcher  dual  updates,  i.e.  we  exchange  the  roles  of  y  and  s  and  let  Hc  =  B~l , 
then  we  get  the  inverse  updates 


H+  =  HC 


-^Hcyy*Hc  +  —  ss*  +  (1  -  fotfHcyvv* 


where 


v  =  —s - j-=—Hcy. 

yts  ylHcy 


(18) 

(19) 


We  now  have  that  4>  —  1  and  0  =  0  yield  the  DFP  and  BFGS  updates,  respectively. 
Every  member  of  the  Broyden  family  of  updates  satisfies  the  secant  condition 


B^s  =  y  .  (20) 

Furthermore,  let 

a  =  ytHcy,  b  =  y*s,  c  =  s'BcS.  (21) 

--  i 

Note  that  b 2  <  ac  with  equality  if  and  only  if  Bc  2  y  and  B2  s  are  collinear,  which  is  true  if  and 
only  if  y  and  Bcs  are  collinear,  which  is  true  if  and  only  if  Hcy  and  s  are  collinear.  From  [12], 
is  s.p.d.  if  b2  <  ac,  b  >  0  and  0  < 

A  K-optimal  rank-two  update  is  found  in  [1]  by  minimizing  the  measure  k( HcB ^,)  over  the 
Broyden  family  of  rank-two  updates.  Note  that  the  spectrum  of  a  matrix  product  C1C2  is  equal 
to  the  spectrum  of  C2C1  and  k(C)  =  ac(C_1).  Consequently,  we  can  replace  HcBj,  by  any  of 
B^Bc,  BcB^1,  B ^  2  BCB^  2  etc.  (See  also,  [12,  Chapter  VII].)  Related  work  is  found  in 

[13]- 

We  now  consider  the  problem  of  finding  those  updates  in  the  Broyden  family  that  minimize 
the  measure  u>.  In  the  following,  we  assume  n  >  2,  Bc  is  an  nxn  s.p.d.  matrix,  s,  y  £  Rn  such 
that  sty  >  0,  and  y,  Bcs  are  linearly  independent.  If  y  and  Bcs  are  linearly  dependent,  then  the 
entire  Broyden  family  of  updates  (as  well  as  the  sized  updates)  reduce  to  the  symmetric  rank-one 
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or  SRI,  update.  Otherwise,  <f>sRi  =  —  ^  and  (f>sRi  =  —  jpj-  Moreover,  the  inverse  of  the  DFP 
update  B0  is  Hi;  the  inverse  of  the  BFGS  update  B\  is  Hq.  In  general, 


4>  =  *■(<£)  = 


i-4> 


(22) 


is  a  1-1  and  onto  mapping  (c.f.  [14])  that  relates  <f>  and  <j>  for  which  B^1  =  H^.  (This  formula 
corrects  a  typographical  error  in  [12]  and  [14].)  The  mapping  satisfies  i2(</>)  =  <f>  and  the  convex 
class  (f>  G  [0,  l]  i — ►  (f>  G  [0, 1]. 

_  I  _  i 

Lemma  3.1  [12,  pg.  Ill]  The  matrix  Bc  2  Bj,Bc  2  has  n  -  2  unit  eigenvalues  and  the  two 
remaining  eigenvalues  are 


where 


A±W  =  /l(^)±(/,W2-/2(^#, 

a(6  -f  c)  —  4>{ac  -  b2) 


M4>)  = 


2  b2 


(23) 


(24) 


We  now  present  the  ^-optimal  update  from  the  Broyden  family.  Notice  that  for  large  n,  the 
w-optimal  update  looks  more  and  more  like  the  BFGS  update  but  that  it  is  in  the  convex  class 
only  when  a  >  b. 

_  i  _i_ 

Theorem  3.1  The  minimum  over  <p  ofu(Bc  2  B^BC  2)  is  attained  at 

(a  -  b)b 


</>»  —  !  + 


Furthermore,  B is  s.p.d.  and 

= 

Proof.  From  Lemma  3.1, 


(1  —  n)(ac  —  b2) ' 

—  (a  — b)6 
(1— n)(ac— &2) 


1  +  +  (l-n)(ac-i>S)^S  “  !) 

(2/i(<£)  +  n  -  2)/n 


oj(Bc  2 B^Bc  *)  = 


(25) 


(26) 


(27) 


We  can  now  substitute  using  (24),  differentiate  and  solve  for  <^„,  the  critical  point.  This  yields 
(25).  Since  u  is  pseudoconvex  and  both  f\  and  fa  are  linear  in  <f>,  </>*  is  the  global  minimizer. 
The  w-optimal  update  is  s.p.d.  because  it  is  easy  to  show  that  for  n  >  2,  <  arafb$ .  (This  also 

follows  from  Lemma  2.1  with  the  appropriate  choice  of  the  closed  set  ft.)  □ 

.  i.  i 

Corollary  3.1  The  minimum  over  <j>  of  w(52  H^Bf  )  is  attained  at 

(c  —  b)b 


<j>*  —  1  + 


Furthermore,  is  s.p.d.  and 


(1  —  n)(ac  —  b 2) 

—(c—b)b 
(1— n)(ac— 62) 


1  +  (!  +  (l_n)(aT-62))(^  -  1) 


(28) 
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Proof.  The  update  formulas  (16)  and  (18)  are  obtained  by  exchanging  the  roles  of  y  and  s,  i.e., 
the  secant  condition  can  be  expressed  as  Bj,s  =  y  or  H^y  =  s,  where  and  <^>  are  related  by  (22). 
This  is  equivalent  to  exchanging  the  roles  of  a  and  c  in  Theorem  3.1.  □ 

Note  that  although  k{A)  —  /c(A_1),  whenever  A~l  is  defined,  this  is  not  the  case  for  the 
measure  u.  Therefore,  the  optimal  update  obtained  in  Theorem  3.1  is  not,  in  general,  the  same 
as  the  update  obtained  in  Corollary  3.1.  Nor  are  the  values  of  the  measure  u  equal.  The  following 
table  summarizes  the  results. 


measure 

<t> 

4> 

uj(Hc  B<f,  H] ) 
{for  optimal  4>) 

A  -  11  (a-fc)6 

—(a— 6)6 

.(A  \  —  A  —  (l-nKac-b2) 

(1— n)(ac— 62) 

A9*)  —  9  -  — —  (a-b)b  wfc2  — 

1  +  (1+(l-n)(oc— 62))(°c 

lo(bIh^b! 

{for  optimal  (f>) 

-(c-b)b 

,(A  }  —  (h  —  (l-n)(ac-6*) 

i  -  ii  (c_6)i> 

L\9*)  9  ~  ,  ,  ,,  ,  (c-b)b 

1  +  (1  +  (l-n)(ac-bS)Xac  J) 

9*  (1 —n)(ac—b2) 

Lj-optimal  rank-two  updates 

In  general,  to  find  an  optimal  </>,,  we  minimize  the  measure  u{Hf  B^Hf)  over  and  then 
use  Stoer’s  formula  to  find  the  corresponding  =  i(d>»).  Conversely,  an  optimal  <j)  refers  to  the 
measure  u>{Bc  H^Bc).  Notice  that  for  large  n,  <£,  is  near  1,  which  corresponds  to  the  DFP,  and 
0*  is  in  the  convex  class  only  when  c  >  b. 
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4  Sizing  and  Weak  Secant  Updating 

The  Broyden  family,  B$,  of  rank- two  updates  satisfies  the  secant  condition  and  preserves  positive 
definiteness  whenever  yts  >  0  for  <f>  <  a“fbi  .  Now  let  B+  be  any  symmetric  matrix  that  satisfies 
the  secant  condition  B+s  =  y.  This  guarantees  that  the  curvature  information  in  B+  along  the 
step  s  is  approximately  correct,  i.e., 


Vf(x  +  s)  =  g+=g  +  Gs  +  0(\\s\\2) 

(29) 

stGs  a  yts  —  stBjrs , 

(30) 

ytG~1y  «  =  ytH+y. 

(31) 

We  refer  to  the  above  two  equations  as  the  direct  and  inverse  weak  secant  conditions.  We  use  /?+ 
rather  than  B$  since  we  will  no  longer  restrict  ourselves  to  updates  from  the  Broyden  family.  By 
choosing  updates  that  minimize  a  measure  such  as  uj(B+),k(B+)  or  \\B+  -  Bc\\p,  we  attempt  to 
guarantee  that  the  new  directional  information  does  not  destroy  too  much  information  already 
built  up  in  Bc.  Note  that  the  measure  u  does  this  uniformly  over  all  the  eigenvalues  while  k  only 
deals  with  the  two  extreme  eigenvalues.  However,  the  low-rank  updates  bring  one  eigenvalue  of 
H+Bc  (or  HCB+ )  to  1  at  each  iteration.  If  the  eigenvalues  of  H+Bc  are  large,  this  results  in 
ill-conditioning  (c.f.  [15,  pg.  275]).  This  can  be  corrected  by  sizing  Bc.  More  precisely,  Bc  is  to 
be  replaced  by  JgcSBc  before  updating  (c.f.  [16]).  Conversely,  if  Hc  is  replaced  by  yHc  before 
updating,  then  we  are  sizing  Hc.  We  now  find  the  w-optimal  <f>  and  <^>  to  determine  a  member 
of  the  Broyden  family  that  is  obtained  after  sizing.  For  each  sizing,  we  obtain  two  ‘optimal’ 
matrices  and  their  inverses. 


Theorem  4.1  If  Bc  <—  | BC)  then  the  optimal  <j> ,  and  corresponding  <f>  =  i(<^„)  are  given  by 

1 


4>+  —  l  + 


and 


4>  =  l(4>„)  = 


1  —  n' 
1 


(32) 

(33) 


(n-  1)  +  (n  — 2)(g  -  1)’ 
respectively.  Similarly,  if  Hc  <—  |//c,  then  the  optimal  and  corresponding  <p  =  are 

4>*  =  1,  (34) 


and 

i(4> *)  =  <t>  =  0.  (35) 

All  values  are  in  the  convex  class.  The  optimal  f> *  gives  the  DFP  update  if  n  =  2  and  approaches 
the  BFGS  update  as  n  grows.  For  every  n,  the  optimal  q gives  the  DFP  update.  If  Bc  <—  fBc, 
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then  the  optimal  <j>m  =  1  and  the  BFGS  update  is  optimal.  Similarly,  if  II c  <—  ^Hc,  then  the 
optimal  4>*  and  corresponding  <f>  =  t{4>*)  are  given  by 


<f>*  —  1  + 


1  -  n’ 


(36) 


and 


a(</>.)  =  <t>  = 


(37) 


(„_!)  +  („_  2)(g-l)’ 
respectively.  This  is  always  in  the  convex  class.  The  optimal  4>*  is  the  BFGS  update.  The 
optimal  </>*  is  also  the  BFGS  update,  if  n  —  2,  but  it  approaches  the  DFP  update  as  n  grows. 


Before  giving  the  proof,  we  summarize  the  results  in  some  tables.  We  prematurely  include  the 
result  of  Theorem  4.4  that  the  values  are  the  same  after  weak  updating. 


measure 

<t> 

4> 

u(H?B4>H?) 

( for  optimal  <f>) 

u(BlH$Bl 

( for  optimal  <$>) 

=  1  +  I hi 

l(4>*)  -  4>  =  o 

l(4>»)  -4>-  (n_1)+(n_2)(g_1) 

<i>*  =  i 

w-optimal  rank-two  updates  after  sizing  Bc  <—  | Bc  (or  direct  shift) 


measure 

<t> 

4> 

w(hIb^hI) 

4>*  =  l 

=  4>  =  o 

( for  optimal  (f> ) 

lv(b!h^b! 

i(<£*)  =  (j>  =  - — -  - 1  — g — - 

(j>*  =  1  +  T~ 

( for  optimal  (f>) 

(„— !)+(„— 2)(~— !) 

u;-optimal  rank- two  updates  after  inverse  sizing  Hc  <—  ^Hc  (or  weak  inverse  update) 
Proof.  Since  B  <—  £ B ,  we  see  that 


Thus  Theorem  3.1  yields 


b,  c  < - c  =  b,  a 

c 


ca 

T 


i  14  (f-fc)fc  ,  ,  1 

+  (1- n)(ac- 62)  +  l-rT 

Applying  (26)  then  yields  (33).  If  B_1  <—  |i?-1  or  B  <—  fB,  we  have  b 
(25)  yields 


<t>=  1  + 


(6  -  b)b 


(1  —  n)(ac  -  b2) 


=  1. 


b,  a  <—  b,  c 


(38) 
fc,  and 

(39) 


11 


Using  Corollary  (3.1)  instead  of  Theorem  3.1  completes  the  proof.  □ 

Notice  that  n  =  2  is  a  very  special  case. 

Sizing  Bc  (or  Hc )  can  cause  a  drastic  change  in  all  the  eigenvalues  of  Bc  so  that  though  G 
and  Bc  may  have  had  a  relatively  good  overlapping  of  their  spectra,  they  no  longer  do.  After  the 
first  sizing  step,  a  better  strategy  might  be  to  shift  the  spectrum  by  a  rank-one  update.  If  we 
want  to  size  both  Bc  and  Hc  simultaneously,  we  can  use  a  rank-two  update.  Consequently,  we 
will  consider  finding  the  ‘closest’  matrix  to  Bc  that  satisfies  the  weak  secant  conditions. 

The  next  few  results  hold  some  surprises  for  experts  in  the  field. 


Theorem  4.2  Let  u,v  be  nonzero  vectors  in  lRn,  and  let  A,M  be  symmetric  matrices  with  M 
s.p.d.  Then, 


A  =  A  + 


(uTv  —  vtAv)MvvtM 
( vTMv )2 


uniquely  solves 

min  \\WT(A-  A)W\\f 
subject  to  vT  Av  =  uTv 

independent  of  W  such  that  M~x  —  (WWT).  Moreover,  if  A  is  s.p.d.  and  M  =  A  then  A  is 
s.p.d.  if  and  only  if  u*v  >  0. 


Proof.  First  note  that  A  is  feasible,  and  let  C  be  any  other  feasible  matrix.  Set  x  =  W  1u. 
Then  WT(A  -  A)W  =  Thus,  \\WT(A  -  A)W\\F  =  \- {WT \  <  \\WT{C  - 

A)kU||2.  Now  if  A  —  M  is  s.p.d.  then  A  is  s.p.d.  if  and  only  if  the  rank-1  update  WT Aw  = 
I  +  AfvU*'  ^~1  vvTW~T  is  s.p.d.  The  latter  is  s.p.d.  if  and  only  if 


0  <  1  +  traced T  f  ^W-lvvTW~T  , 

( v 1  Av)2 

which  is  equivalent  to  ^vvT Av  <  1,  which  is  true  if  and  only  if  uTv  >  0.  □ 

Now  we  apply  this  theorem  to  direct  shifting  and  then  to  weak  inverse  updating,  i.e.  to 
shifting  B  and  then  to  shifting  H .  We  use  the  terms  Greenstadt  and  DFP  to  refer  to  the  choice 
of  W.  In  fact,  Greenstadt  never  considered  least  changes  to  Bc,  only  to  Hc. 


Corollary  4.1  The  direct  weak  Greenstadt  update 

B  =  Bc  +  sy(vTs  ~  sT Bcs)BcssT Bc  (40) 

uniquely  solves 

min  \\Wt(B  -  Bc)W\\f 
subject  to  sTBs  =  yTs 

for  any  square  W  such  that  Hc  =  WWT .  Moreover,  B  is  s.p.d.  if  and  only  if  yT s  >  0.  Also,  the 
direct  weak  DFP  update 


B  —  Bc  + 


1 


(yTs  -  sTBcs)yyT 


(41) 


uniquely  solves 

min  \\Wt(B  -  Bc)W\\f 

subject  to  stBs  =  yTs  ' 

for  any  square  W  such  that  (WWT)~xs  =  y. 

It  is  surprising  that  (40)  is  a  hereditarily  positive  definite  update,  but  (41)  is  not.  This  is 
directly  opposite  the  case  for  strong  secant  methods  with  the  same  weighted  Frobenius  norms 
since  (41)  corresponds  to  the  DFP  secant  update.  Now  we  will  see  in  the  next  corollary  that  the 
same  twist  holds  for  weak  inverse  updating;  the  Greenstadt  inverse  update  is  hereditarily  positive 
definite,  but  the  BFGS  is  not,  as  we  found  with  the  first  two  randomly  generated  examples  in 
MATLAB  that  we  tried. 


Corollary  4.2  The  inverse  weak  Greenstadt  update 


H  =  HC  + 


(yTHcyfyTa  -  HTHcy)HeyyTHc 


(43) 


uniquely  solves 

min  || Wt(H  -  He)W\\F 
subject  to  yTHy  =  yTs 

for  any  square  W  such  that  Bc  —  WWT .  Moreover,  H  is  s.p.d.  if  and  only  if  yT s  >  0.  Also  the 
weak  BFGS  update 

(44) 


^  '  t „  „,x  it 


H  =  Hc+  (yTgy{y  s-V  Hcy)ss 


uniquely  solves 


min  || Wt(H  -  Hc)W\\f 
subject  to  yTHy  =  yTs 

for  any  square  W  such  that  ( WWT)~ly  =  s. 


(45) 


If  we  try  to  find  the  the  ‘best’  update  with  respect  to  our  measure  u  that  satisfies  the  spectral 
conditions  (30)  and  (31),  then  we  just  scale  Bc  or  Hc  since  the  value  of  u(BHc )  or  u(B~lBc) 
will  be  one.  It  is  interesting  that  if  we  let  <j  >  0  and  apply  the  following  theorem  to  crBc,  we  get 
(vBc)+  =  oB+. 


Theorem  4.3  The  update  B  that  solves 


min 
subject  to 


u{Hc  *  BHC>)  (or  u(BfB-lBl)  ) 
=  yls 


(46) 


is 


B  «- 


y*s 


Bc 


(47) 
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Proof. 

_1  -  _i  e 

Since  Hc  2  BHC  2  =  we  have  obtained  the  minimum  of  k  and  w,  i.e.,  u  —  1  =  k.  □ 


Corollary  4.3  The  update  B  that 

solves 

min 

u(B$B-'B$)  or  lv(bIb-1B!)  ) 

subject  to 

y*B~ly  =  y*s 

(48) 

is  obtained  from 

yts 

(49) 

measure 

optimal 

optimal  inverse 

\\hI(b  -  B)H?\\f 

( constraint  slBs  —  yts  ) 

B  =  Bc  +  jr(b  -  c)BcsstBc 

B~'  =  HC  +  &„< 

WbUb-'-hjbIwf 

( constraint  ytHy  =  yls  ) 

B  =  Bc  +  ^yy* 

B-1  =  Hc  +  4,(6  -  a)fTcyy*J5rc 

ui(Hc  BHc) 

(or  u>(B?B-lBl)  ) 

( constraint  stBs  =  yls  ) 

u(b!b~1b! 

B  =  \BC 

B-1  =  fBf' 

(or  u(hIbhI)  ) 

B  =  i  Bc 

B-'  =  iBp 

(constraint  ytHy  =  y*s  ) 

Optimal  updates  for  weak  secant  equations. 


In  Theorem  4.1  we  presented  the  optimal  rank-2  updates  in  the  Broyden  family  obtained 
after  sizing  and  using  the  measure  u.  We  now  show  that  we  obtain  the  same  optimal  <j>„  (and 
</>*)  to  strongly  update  the  weakly  updated  matrix.  This  does  not  mean  that  the  corresponding 
5+  matrices  will  be  the  same,  it  just  means  they  are  obtained  from  the  sized  or  weakly  updated 
Bc  (or  Hc)  using  the  same  formula  from  the  Broyden  class. 

Theorem  4.4  The  optimal  <j>*  and  4>*  expressions  in  Theorem  4-1  ore  unchanged  if  we  replace 
sizing  Bc  (Bc  <—  and  Hc  <—  y Hc)  with  the  direct  weak  Greenstadt  update  (4-1)  of  Corollary 
4-1  and  we  replace  inverse  sizing  HC(HC  <—  \HC  and  Bc  <—  fBc)  with  the  inverse  weak  Greenstadt 
update  (4-2)  of  Corollary  (4-2). 

Proof. 

Suppose  that  we  apply  the  direct  weak  update.  Then  the  Sherman-Morrison  formula  yields 

H  =  Hc  +  ^-sst, 
bc 
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which  implies 


a  *—  a+  -(c  -  b). 
c 

We  also  have  b  <—  b  and  c  *—  b.  Therefore,  Theorem  3.1  yields 

_  (a  —  b)b 

+  (1  -n)(ac-b2) 

=  n  (g  +  t-E-Ht 

(l-„)((a+*fc2)i-42) 


Similarly,  the  optimal  </>*  =  1,  since  b  —  c.  By  exchanging  the  roles  of  Hc  and  Bc,  we  see  that  the 
weak  inverse  update  yields 

<t>*  —  1  +  - - ,  <f>*  =  1  • 

1  -  n 

□ 

We  now  apply  the  weighted  Frobenius  norm  measures  after  weak  updating.  Although  we  do 
not  restrict  the  updates  to  the  Broyden  class,  we  get  a  new  Fletcher-dual  pair  of  updates  in  the 
Broyden  class.  These  symmetric  updates  are  hereditarily  positive  definite  but  not  always  in  the 
convex  class.  The  new  updates  are  (in)  and  (iv)  in  the  following  theorem. 


Theorem  4.5  The  following  are  equivalent  updating  sequences.  The  first  four  are  hereditarily 
positive  definite. 

(i)  The  result  of  a  direct  weak  Greenstadt  update  or  a  weak  BFGS  update  followed  in  either 
case  by  a  BFGS  update  is  a  BFGS  update. 

(ii)  The  result  of  an  inverse  weak  Greenstadt  update  or  a  weak  DFP  update  followed  in  either 
case  by  a  DFP  update  is  a  DFP  update. 

(iii)  The  result  of  a  direct  weak  Greenstadt  update  followed  by  a  DFP  update  is  the  <f>  =  l  —  = 

update  from  the  Broyden  class. 


a  '  ac  / 


(iv)  The  result  of  an  inverse  weak  Greenstadt  update  followed  by  a  BFGS  update  is  the  <f>  =  1- 
update  from  the  Broyden  class. 


$  6  {  b 2 


c  v  ac  f 

The  following  sequences  may  not  be  hereditarily  positive  definite 

(v)  The  result  of  a  weak  DFP  update  followed  by  a  BFGS  update  is  the  <f>  =  4>  — 

member  of  the  Broyden  class. 

(vi)  The  result  of  a  weak  BFGS  update  followed  by  a  DFP  update  is  the  <j>  =  <f>  = 

member  of  the  Broyden  class. 
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Proof. 

The  proofs  are  much  the  same,  so  we  will  do  only  (i)  and  (iv)  since  they  seem  to  be  the  most 
interesting  updates. 

The  direct  weak  Greenstadt  update  is 

B  =  Bc-\-  ° BcssT Bc  and  Bs  =  (1  +  - — -)Bcs. 


The  weak  BFGS  is 


b  —  c 


b  —  a. 


b  —  a. 


H  =  Hc  +  -j-j-ss1  and  s  -  Hy  =  s  -  Hcy  -  (  L  )s  =  (1 - —)s  -  Hcy 


In  the  first  case, 


5+  =  B- 


BsstB 


b  —  c 


yy 


b  -  c.2BcssT Bc  yyT 


In  the  second, 


H+  =  H  + 


—  Bc  H — ^ — Bcss  Bc  —  (1  H - - — ) 

„  BcsstBc  yyT 

=  - 1 - — . 

c  b 


(s  -  Hy)sT  +  s(s  -  Hy)T  yT(s  -  Hy)ssT 


+ 


b  b2 

b-a  ssT  Hct)sT +  s((l-tz°)s- Hcy)T  n 

~  hc+  b  ~b~  +  b  ° 

(s  -  Hcy)sT  +  s(s  -  Hcy)T  6-a  nb-a  T 
=  Hc+ - b - +  (-p-"2— )"  * 

which  is  the  BFGS  update  of  Hc.  The  proof  of  (iv)  is  as  direct. 

The  inverse  weak  Greenstadt  update  of  Hc  is 


H  =  HC  + 


(&  a)-HcyyTHc 


C  I  o 

a z 


and  Hy  =  (1  +  ^-)Hcy,  s  -  Hy  =  s  —  Hcy  —  Hcy .  So,  following  with  a  BFGS  update, 

(s  —  Hy)sT  +  s(s  —  Hy)T  yT(s  —  Hy)ssT 


H+  =  H  + 


b  b 2 

(s  -  Hcy)sT  +  s(s  -  Hcy)T 


Hc  +  b—^HcyyTHc  + 

or  u 

-  (b—^-)(HcysT  +  syTHc )  ±  - — a--T 


b2 


-ss 


TT  ,  (s  -  Hcy)sr  +  s{s  -  Hcy)T  b-a  T 

H< + - b - 

.  ,,  \,Hcy  yT Hc  Hcy  sT  s  yT Hc  ssT 

+  (6  _  a)(— - - - —  t  “  I  —  +  I  ~r)- 


a  a 


a  b  b  a  b  b 
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The  first  three  terms  are  the  BFGS  and  the  last  is 


where 


which  i 


( l  Hcy  s  Hcy  T  (b  T 

(b  -  a)(-  -  — — )(-  -  —y  =(--  l)a  vv1 , 
baba 


,b 

a 


•j  is  given  by  (19).  Now  use  (18)  with  <$>  =  0  for  the  BFGS  portion  and  we  get 

H+  -  Hc-  -HcyyTHc  +  +  avvT  +  (-  -  1  )avvT 

a  b  a 


s  (18)  with  1  —  0=l  +  £  —  1  =  £  and  (iv)  is  proven.  □ 
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5  The  ^-Optimal  Update 


In  Section  3  we  found  the  ‘best’  rank-two  updates  in  the  Broyden  class,  i.e.,  the  rank-two  updates 
parametrized  by  <f>  and  </>  in  (16)  and  (18)  that  minimize  the  measure  u.  However,  if  we  do  not 
restrict  ourselves  to  rank-two  updates  but  only  to  maintaining  positive  definiteness  and  the  secant 
equation,  then  we  obtain  a  different  result.  We  see  that  the  best  update  of  Bc  in  the  case  ac  ^  b2 
is  obtained  by  inverse  sizing  Bc  and  then  applying  the  BFGS  update.  Similarly,  the  best  possible 
inverse  update  is  obtained  by  sizing  and  operating  on  Hc  with  DFP.  The  updates  are  different 
although  the  optimal  values  of  ui  are  ultimately  equal. 

We  continue  to  assume  that  yts  >  0  and  that  y  and  Bs  are  linearly  independent. 


Theorem  5.1  Assume  Bc  is  s.p.d.  and  ac  ^  b 2 .  Then  for 

b  n 

a  trace(HcB+Y 

the  BFGS  update  of  \BC, 


H+  =  aHc  -f  usT  +  suT,  u  = 


s  -  aHcy 


is  the  unique  solution  of 

min  w(HcB+ ) 

subject  to  B+s  =  y,  B+  s.p.d. 

In  addition,  the  Lagrange  multiplier  for  the  secant  equation  is  uniquely 

2(5  -  aHcy) 


ban(det(HcB+))n 


and  the  optimal  value  of  the  measure  is 


u(HcB+)  =  (^. 

Proof.  First,  we  note  that  an  optimal  B+  s.p.d.  exists  from  Lemma  2.1.  The  Lagrangian  for 
our  problem  is 

L(X,B+)  =  g(B+)  +  Xt(B+s-y), 


where 


and  A  6  Rn  is  the  Lagrange 
by  the  constant  n(det(Hc))n 


trace(HcB+ ) 
n(det(Hc))n  (det(B+))n 
multiplier.  For  simplicity  of  notation,  we  multiply  the  Lagrangian 
and  remove  this  constant  from  A  at  the  end.  Now 


XtB+s 


trace(Xt  B+s) 
trace(B+sXt ) 
trace(XstB+) 
trace(sXt  B+) 


.sXl  +  Xsl 
tracey - - - 
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by  adding  the  previous  two  equivalences  and  dividing  by  2.  Therefore  the  gradient  of  the  linear 
functional  XtB+s  =  («A!±Asl?  i?+)  on  the  subspace  of  symmetric  matrices  is 

sA*  -f  As* 


Using  the  cofactor  expansion  of  the  determinant  along  a  row  of  the  matrix,  i.e.  det(A)  = 
akj(—l)k+*det(A(k,j))  where  A(k,j)  denotes  the  submatrix  of  A  obtained  by  deleting  row  k 
and  column  j,  we  see  that  the  gradient  of  det(B+ )  is  adjB+,  the  (symmetric)  matrix  of  signed 
cofactors.  We  will  also  need  Cramer’s  rule,  i.e.  B+x  =  det^B+ ) adj  B+. 

We  can  now  differentiate  the  Lagrangian  with  respect  to  B+  and  equate  the  derivative  to  0; 

sA*  +  As* 


0  = 


(det  B+)n 


{( det  B+)"HC  -  tr0CC{  HcB±\det  B+)»-\adj  B+ )}  + 

n 


or 


Let 


0  = 


trace(  HCB+) 


»  n(det  B+)*  sA*  +  As* 
c  +  +  trace(  HCB+)  2 


a  = 


n  n(det  B+)n  ^ 

' ,  U  =  —  ;  -  „  „  — A. 


trace(  HCB+ )’  2  trace(  HCB+ ) 


Then  (50)  becomes 
Since  B+ly  =  s,  we  obtain 


B+  =  aHc  +  su  +  us  . 


(50) 

(51) 

(52) 


aHcy  +  su*y  +  usly  —  s 

(53) 

or 

s  aHcy  uly 

U  =  *  t  S\  t  )' 

sly  sly  sly 

(54) 

Let 

sly 

(55) 

and  substitute  (54)  in  (53).  We  get 

s*w 

aHcy  +  s— - 

s*t/ 

sylaHcy  t  „  ssty  aHcy  t  t  _ 

T—  -  m* 0y  +  -f- - -f*-8*y  -  ss*/3j/  =  s 

sly  sly  sly 

or 

s*y 

(56) 

Therefore, 

1  y‘aHcy 

*  2sV  sty  ’ 

(57) 

iFrom  (54),  (55),  and  (57),  we 

conclude 

U  ■ 

1  ,  rr  ,,  ,s‘y  -  y*aifcyN 

=  (  2(»W 

(58) 
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We  can  now  substitute  for  u  in  (52)  and  obtain 


+ 


aHc  +  us*  +  su* 


ss*  alleys1  ss*  ss *  ytaHcy 

aHc  +  -r - 7 - — — H 

gty  gty 


ss*  sytaHc 


s*y 


sty 


2  s*y  2  sty  s*y 

ss*  ss*  y*aHcy 


2  s*y  2  s*y  s*y 


aHc  -\ — 7 - 

s*y 


ss*  aHcys *  +  sy*aHc  ss*y*aHcy 


s*y 


+ 


(s*y)'2 


(59) 


Note  that  (59)  is  the  BFGS  update  of  ( aHc )  and  is  equivalent  to  (18)  with  <f>  =  0  (c.f.  [15,  pg. 
269].  Since  the  update  B+  is  the  best  possible  for  our  measure  and  since  it  is  a  rank- two  update 
of  i -Bc ,  we  conclude  that  ^  is  the  constant  that  makes  the  BFGS  update  the  best  among  all 
rank-two  updates  which  includes  the  Broyden  class.  We  can  now  apply  Theorem  3.1,  i.e.  we 
want  4>  in  (25)  to  be  one  in  order  to  get  the  BFGS  update.  Since  b  >  0,  this  is  equivalent  to 
scaling  Bc  so  that  the  new  a  equals  b ,  i.e.,  y*aHcy  =  b  or 


b 


(60) 


The  values  for  the  Lagrange  multiplier  A  and  for  a  are  given  in  (51).  The  optimal  value 


u(HcB+)  =  u((aHc)B+), 

by  Proposition  2.1(i).  Therefore,  B+  is  the  BFGS  update  of  the  sized  Bc,  i.e.  4>  —  1,  a  =  b,  b  =  b, 
c  =  ac/b,  and  Lemma  3.1  yields 


u>(aHcB+ )  = 


(2/i(l)  +  n  —  2)/n 

(/2(1))" 

(2  -f  n  —  2 )/n 

(62/ac)n 


Corollary  1  Assume  Bc  s.p.d.  and  ac  ^  62.  Then 


b 

a  —  - 


n 


c  trace(H+Bc ) 


and  the  DFP  update  of  aBc, 


B+  -  aBc  +  W  +  yu*,  u  — 


y  -  aBcs 


is  the  unique  solution  of 


min  u>(BcH+ ) 

subject  to  H+y  =  s,  H+s.p.d. 
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In  addition,  the  Lagrange  multiplier  for  the  secant  equation  is  uniquely 

2 (y  -  aBcy ) 
ban(det(BcB ))" 

and  the  optimal  value  of  the  measure  is  equal  to  the  optimal  value  in  Theorem  5.1. 

Proof. 

We  need  only  exchange  the  roles  of  Bc  and  Hc  In  the  theorem  and  note  that  the  optimal  value 
does  not  change.  □ 

This  theorem  and  corollary  state  that  we  obtain  the  same  updated  matrix  B+  whether  we 
apply  the  optimal  </>„  formula  to  Bc  or  the  sized  Bc,  or  indeed  any  oBc,cr  >  0.  Therefore,  the 
w-optimal  update  of  crBc  is  also  the  sized  BFGS  update  of  Bc.  Similarly,  the  w-optimal  inverse 
update  of  oHc  is  the  sized  DFP  update  of  Hc. 
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6  Concluding  Remarks 


In  this  paper  we  have  studied  the  measure  u:  as  it  relates  to  the  derivation  of  updates  of  the 
Hessian  in  least  change  secant  methods.  We  have  seen,  in  Section  5,  that  sizing  of  the  Hessian 
arises  naturally  from  this  measure.  In  particular,  the  inverse-sized  BFGS  and  sized  DFP  are  the 
optimal  u  updates.  This  further  illustrates  the  central  role  of  the  BFGS  and  DFP  updates.  We 
have  also  considered  weak  secant  updating  in  place  of  sizing. 

Powell  [8]  shows  that  the  DFP  update  performs  far  worse  than  the  BFGS  update  when  applied 
with  direct  prediction  steps  to  the  simple  quadratic  function 

f(x)  =  +  x\). 

He  shows  that  the  DFP  was  far  less  effective  than  the  BFGS  at  reducing  large  eigenvalues.  Of 
course,  sizing  can  reduce  large  eigenvalues  immediately.  In  this  respect,  sizing  can  be  considered 
a  ‘fix’  for  the  DFP.  This  is  corroborated  in  the  following  numerical  data  which  compares  Powell’s 
data  with  sized  DFP  and  sized  BFGS.  The  initial  Hessian  approximation  is  the  diagonal  matrix 
diag{  1,  Ai),  while  the  initial  point  is  =  (cos  sin^q)1.  The  numbers  represent  the  number  of 
iterations  needed  to  obtain  the  condition  ||a;fc+i||  <  e||a;i||.  The  numbers  in  brackets  are  for  the 
sized  updates. 


\ 

if)i  20  deg  40  deg  60  deg 

70  deg 

80  deg 

85  deg 

87  deg 

88  deg 

M 

10 

5  (9; 

)  6  (10)  7  (7) 

8(6) 

7(3) 

6(9) 

5(9) 

4(9) 

100 

5  (10)  7  (14)  8  (16) 

9(10) 

10  (7) 

10  (6) 

9(7) 

9(6) 

104 

5  (14)  7  (27)  8  (30) 

9  (20) 

11  (12) 

12  (8) 

13(11) 

14  (13) 

106 

5  (19)  7  (15)  8  (15) 

9(34) 

11  (14) 

12  (10) 

13  (8) 

14  (6) 

109 

5  (12)  7  (14)  8  (17) 

9(10) 

11(8) 

12  (21) 

13  (10) 

14  (8) 

Table 

5.1. 

Number  of 

iterations  for  the 

BFGS  (sized  BFGS)  when  e  - 

=  10~4 

V’i 

1  20  deg 

40  deg 

60  deg 

70  deg 

80  deg 

85  deg 

87  deg 

88  deg 

10 

6(8) 

10  (5) 

14  (5) 

16  (5) 

14  (5) 

9(4) 

7(6) 

6(7) 

100 

8(8) 

15  (5) 

29  (6) 

47  (6) 

89  (8) 

106  (8) 

84  (7) 

59  (6) 

1000 

10  (8) 

19  (5) 

45  (6) 

83  (7) 

230  (8) 

549  (10) 

855  (10) 

1000  (10) 

104 

12  (8) 

24  (5) 

60  (6)  : 

119  (7) 

380  (9) 

1141  (10) 

2420  (11) 

4102  (12) 

106 

15  (8) 

34  (5) 

92  (6)  : 

181  (7) 

752  (9) 

3482  (10) 

5162  (11) 

9194  (11) 

Table 

5.2. 

Number  of 

'  iterations  for  the 

DFP  (sized  DFP) 

when  e  =  10  4 

From  Theorems  4.1  and  5.1  and  Corollary  5.1,  the  sized  DFP,  inverse-sized  BFGS,  optimal  <j> 
and  inverse  optimal  <^,  are  all  equal  in  the  case  n=2.  (In  fact,  Proposition  2.1  (iii)  implies  that 
they  are  also  equal  to  the  optimally  conditioned  sized  symmetric  rank-one  update.)  This  clearly 
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appears  to  be  the  best  update  in  the  above  tables.  We  now  decrease  e.  The  following  results 
show  that  we  do  not  appear  to  lose  asymptotically  when  we  use  the  u  optimal  updates  in  the 
case  n  =  2. 


V’l 

20  deg 

40  deg 

60  deg 

70  deg 

80  deg 

85  deg 

87  deg 

88 

deg 

''l 

10 

6(9) 

7(6) 

9(6) 

9(5) 

8(5) 

7(7) 

6(7) 

5 

(8) 

100 

6(9) 

8(6) 

9(7) 

10  (8) 

11(8) 

11  (7) 

11(7) 

10 

(7) 

104 

6(9) 

8(6) 

10  (7) 

11(9) 

12  (10) 

14  (10) 

15  (10) 

15 

(11) 

106 

6(9) 

8(6) 

10  (7) 

11(9) 

12  (10) 

14  (11) 

15(11) 

16 

(12) 

109 

6(9) 

8(6) 

10  (7) 

11(9) 

12  (10) 

14(11) 

15(11) 

16 

(12) 

Table  5.3. 

Number  of  iterations  : 

for  the  BFGS  (sized  DFP) 

when  e  = 

10~6 

20  deg 

40  deg 

60  deg 

70  deg 

80  deg 

85  deg 

87  deg 

88 

deg 

10 

7(10) 

9(7) 

10  (7) 

10  (7) 

10  (6) 

8(5) 

7(8) 

6 

(9) 

100 

7(10) 

9(7) 

11(7) 

12  (7) 

13  (9) 

13  (9) 

12  (8) 

11 

(8) 

104 

7(10) 

9(7) 

11  (7) 

12  (8) 

14  (10) 

15(11) 

16  (12) 

17 

(13) 

106 

7(10) 

9(7) 

11(7) 

12  (8) 

14  (10) 

15  (11) 

16  (12) 

17 

(12) 

109 

7(10) 

9(7) 

11  (7) 

12  (8) 

14  (10) 

15(11) 

16  (12) 

17 

(12) 

Table  5.4.  Number  of  iterations  for  the  BFGS  (sized  DFP)  when  e  =  10~9 

We  have  also  tested  out  22  methods  on  the  standard  set  of  19  test  problems  from  [4].  We 
include  some  of  the  results  below.  The  methods  are: 

1.  BFGS 

2.  optimal  4> 

3.  optimal  </> 

4.  size  at  first  step  only  and  use  optimal  4> 

5.  size  at  first  step  only  and  use  optimal  <f> 

6.  inverse  size  at  first  step  only  and  use  optimal  (f> 

7.  inverse  size  at  first  step  only  and  use  optimal  <{> 

8.  size  at  first  step  only,  direct  shift  subsequently  and  use  optimal  <f> 

9.  size  at  first  step  only,  direct  shift  subsequently  and  use  optimal  <f> 

10.  inverse  size  at  first  step  only,  weak  inverse  update  subsequently  and  use  optimal  4> 
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11.  inverse  size  at  first  step  only,  weak  inverse  update  subsequently  and  use  optimal  <j> 

(The  numbers  in  the  following  two  tables  include  iterations/function-evaluations.  ***  indi¬ 
cates  an  error  or  lack  of  convergence.) 


Methods 

Problems 

1 

2 

3 

4 

5 

6 

7 

1 

9/11 

6/8 

5/8 

6/8 

46/48 

6/8 

46/48 

2 

29/38 

36/43 

40/47 

29/33 

40/44 

29/34 

50/55 

3 

37/48 

41/45 

39/47 

45/48 

53/59 

45/49 

56/66 

4 

3/5 

4/8 

3/5 

4/9 

5/9 

4/9 

5/9 

5 

149/191 

*** 

107/754 

209/273 

*** 

208/275 

*** 

6 

29/37 

27/33 

49/59 

30/37 

73/80 

34/40 

77/83 

7 

20/30 

20/30 

18/27 

18/27 

18/27 

18/27 

18/27 

8 

37/47 

35/41 

42/52 

53/60 

66/71 

54/59 

70/75 

9 

138/184 

201/259 

167/215 

148/191 

145/186 

144/187 

148/189 

10 

871/1131 

865/1077 

838/1064 

253/292 

233/263 

276/323 

237/273 

11 

*** 

22/26 

*** 

22/26 

*** 

22/26 

*** 

12 

23/48 

*** 

*** 

31/40 

*** 

*** 

56/65 

13 

30/42 

*** 

71/88 

54/72 

105/120 

57/74 

105/120 

14 

28/30 

28/30 

27/28 

29/31 

31/33 

30/32 

31/32 

15 

52/75 

55/73 

72/95 

38/52 

46/58 

39/54 

47/61 

16 

109/128 

125/142 

119/139 

40/45 

72/76 

44/48 

71/75 

17 

14/19 

17/22 

16/46 

17/22 

167/174 

17/22 

150/158 

18 

53/81 

50/67 

51/70 

41/50 

72/84 

41/50 

74/85 

19 

21/35 

22/32 

21/33 

27/32 

31/38 

29/35 

33/40 

average 

91.8/121.1 

97.1/121.0 

99.1/163.4 

57.7/70.9 

75.1/85.6 

60.9/75.1 

74.9/85.1 

Table  5.5.  Methods  1  through  7 
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Methods 

Problems 

8 

9 

10 

11 

1 

9/11 

9/11 

10/12 

9/11 

2 

30/38 

31/40 

35/42 

27/35 

3 

48/51 

109/111 

47/50 

44/48 

4 

3/7 

3/7 

3/7 

3/7 

5 

164/214 

162/210 

*** 

150/200 

6 

34/40 

44/51 

44/50 

33/40 

7 

18/27 

18/27 

18/27 

18/27 

8 

53/58 

129/133 

58/63 

52/57 

9 

148/193 

*** 

151/200 

147/198 

10 

266/308 

984/1030 

270/310 

271/325 

11 

12/16 

12/16 

13/17 

*** 

12 

32/41 

*** 

*** 

*** 

13 

52/64 

63/73 

65/77 

46/64 

14 

30/32 

54/55 

30/31 

29/31 

15 

38/56 

40/56 

*** 

37/55 

16 

45/49 

121/130 

50/54 

45/49 

17 

15/20 

15/20 

17/22 

14/19 

18 

40/47 

63/70 

46/54 

40/48 

19 

29/37 

68/64 

30/34 

29/34 

average 

56.1/68.9 

113.2/123.8 

55.4/65.6 

58.5/73.4 

Table  5.6.  Methods  8  through  11 
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